Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New to STATA | Linear regression on STATA

    Hi,

    I am new to STATA and I am trying to do a linear regression analysis for a college project.

    I am getting the error message, "matrix not positive definite" when I run the 'reg' command

    Further, when I run the 'vif' command to check for multicollinearity the error message reads, "not appropriate after regress, nocons;
    use option uncentered to get uncentered VIFs"

    Please help

  • #2
    See Advice on posting to Statalist.

    What's your data? What's the exact command you used?

    Comment


    • #3
      Hi, Jean Claude Arbaut

      My data has been sourced from Gretl data file 7-20

      The command I ran was this:
      reg SALARY YRS HT WT AGE GAMES GAMESTRT FORWARD GUARD MIN FGA FGPRCNT FTA
      FTPRCNT REBOUNDS ASSISTS STEALS BLOCKS POINTS AVGPNTS RACE EW TRD WINTM AL
      LSTAR XPAN

      I am getting an error message which reads: "matrix not positive definite"

      Comment


      • #4
        Ok, so first let me fill the blanks, as "Gretl data file 7-20" is not very explicit. The source distribution of Gretl, which can be found here, indeed has a data file named data7-20.gdt, in the directory gretl-2019a\share\data\ramanathan\. Gretl files are stored in a kind of XML format that is relatively easy to import in Stata by hand. The data file states: "Data on NBA players' salaries and their determinants compiled by Michael Pepek". It has 26 variables and 56 observations. The files in this directory come from Ramu Ramanathan's Introductory Econometrics with Applications, and they can be found in Excel format here.

        I am not sure why you believe it's wise to just throw all variables at regress, but, as a matter of fact, I get no error (using the latest update of Stata 15.1 on Windows).

        Note however that most variables have a large p-value, which should not be a surprise.

        Code:
              Source |       SS           df       MS      Number of obs   =        56
        -------------+----------------------------------   F(25, 30)       =      1.96
               Model |  15215965.8        25  608638.632   Prob > F        =    0.0398
            Residual |  9319718.12        30  310657.271   R-squared       =    0.6202
        -------------+----------------------------------   Adj R-squared   =    0.3036
               Total |  24535683.9        55  446103.344   Root MSE        =    557.37
        
        ------------------------------------------------------------------------------
              SALARY |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 YRS |   128.1303   133.5022     0.96   0.345    -144.5176    400.7783
                  HT |  -35.11119    69.0725    -0.51   0.615    -176.1761    105.9537
                  WT |   4.434424   9.095008     0.49   0.629    -14.14006    23.00891
                 AGE |  -132.0913   131.1246    -1.01   0.322    -399.8834    135.7008
               GAMES |   27.34536   20.43367     1.34   0.191    -14.38576    69.07648
            GAMESTRT |   4.372733   7.472166     0.59   0.563    -10.88747    19.63293
             FORWARD |  -362.2436   321.0887    -1.13   0.268    -1017.994     293.507
               GUARD |  -1269.289   564.6327    -2.25   0.032    -2422.423   -116.1551
                 MIN |  -.4853707   .4743311    -1.02   0.314    -1.454084    .4833426
                 FGA |  -4.443802   2.684677    -1.66   0.108    -9.926643     1.03904
             FGPRCNT |  -10676.55   6494.316    -1.64   0.111    -23939.71    2586.613
                 FTA |   -3.23988   2.239173    -1.45   0.158    -7.812882    1.333122
             FTPRCNT |   -3518.25   1839.208    -1.91   0.065    -7274.414    237.9133
            REBOUNDS |  -.7522268   .6883566    -1.09   0.283    -2.158038    .6535849
             ASSISTS |   1.069499   .7430943     1.44   0.160    -.4481017      2.5871
              STEALS |  -.6288165   1.237957    -0.51   0.615    -3.157062    1.899429
              BLOCKS |   .7969876   1.521496     0.52   0.604    -2.310322    3.904297
              POINTS |    3.10725   2.525319     1.23   0.228     -2.05014    8.264639
             AVGPNTS |   165.0579    62.8582     2.63   0.013     36.68431    293.4314
                RACE |  -13.41989   225.5038    -0.06   0.953      -473.96    447.1202
                  EW |  -145.4598    186.694    -0.78   0.442    -526.7399    235.8203
                 TRD |  -298.5147   298.5381    -1.00   0.325    -908.2107    311.1814
               WINTM |   13.94025   210.2766     0.07   0.948    -415.5018    443.3823
             ALLSTAR |   506.9222   235.8301     2.15   0.040     25.29297    988.5515
                XPAN |  -1111.609   665.9898    -1.67   0.106    -2471.741    248.5239
               _cons |   12572.83    7250.27     1.73   0.093      -2234.2    27379.85
        ------------------------------------------------------------------------------
        Jean-Claude Arbaut
        Last edited by Jean-Claude Arbaut; 30 Apr 2019, 05:15.

        Comment


        • #5
          Thanks to Jean-Claude for the links. Works for me too. If you have a pirated or corrupted version of Stata that may cause problems. But make sure your Stata is up to date. Type

          update all

          and follow the instructions.

          Here is the data if anyone wants to try it. Rochan, in the future it would also be a good idea to use dataex to post a sample of your data. Especially in your case, where the whole data set can easily be listed.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int SALARY byte(YRS HT) int WT byte(AGE GAMES GAMESTRT FORWARD GUARD) int(MIN FGA) double FGPRCNT int FTA double FTPRCNT int(REBOUNDS ASSISTS STEALS BLOCKS POINTS) double AVGPNTS byte(RACE EW TRD WINTM ALLSTAR XPAN)
          3750  4 84 240 27 80 80 0 0 2896 1282 .567 484 .746  240  188 117 281 1815 22.7 1 1 0 1 1 0
          3330  4 79 215 26 82 82 1 0 3093 1630 .509 553 .892  483  415 176  39 2176 26.5 0 0 0 0 1 0
          3100 10 81 222 30 77 77 0 1 2886 1137 .509 563 .911  607  988 138  22 1730 22.5 1 0 0 1 1 0
          3000  6 78 253 26 79 79 1 0 3088 1208 .579 799 .753  986  325 126  67 2037 25.8 1 1 0 1 1 0
          2750 10 81 220 33  6  6 1 0  189  104 .471  19 .847   37   29   6   5  116 19.3 0 1 0 0 1 0
          2500  6 78 198 26 81 81 0 1 3255 1795 .538 793  .85  652  650 234  65 2633 32.5 1 1 0 1 1 0
          2500  6 84 250 26 82 82 0 0 3024 1556 .508 652 .696 1105  149 213 282 2039 24.8 1 0 0 1 1 0
          2500  4 81 250 25 79 77 1 0 3000  472 .523 160 .744  299   53  20  20  613 19.8 1 0 1 0 0 0
          2200  4 84 260 26 74 22 0 0 1531  269 .524 114 .553  453   56  54  98  345  4.7 0 1 0 1 0 0
          2140  2 74 180 23 81 81 0 1 3179 1128 .505 576 .882  340  991 135  24 1650 20.4 1 0 0 1 1 0
          2140  4 75 195 26 81 81 0 1 3102 1146 .471 324  .84  367  770 146   8 1431 17.7 0 0 0 0 0 0
          2100  8 73 185 28 80 76 0 1 2924 1227 .464 351 .818  273  663 133  20 1458 18.2 1 1 0 1 1 0
          2100  7 80 200 29 80 80 1 0 2997 1756 .464 524 .844  553  211 117  52 2099 26.2 1 1 0 1 1 0
          2100 13 82 255 34 81 80 0 0 2878 1096 .491 711 .789  956  112  79 100 1637 20.2 1 1 0 1 0 0
          2100  3 84 280 25 79 79 0 0 2662 1161 .477 428 .757  635   60  56  49 1432 18.1 1 0 0 0 0 0
          2100  6 88 230 29 61 36 0 0 1086  365 .449  95 .653  307   77  31  65  393  6.4 1 0 1 0 0 0
          2100 13 79 190 35 82 82 1 0 2990 1881 .491 379 .858  326  383  66  12 2175 26.5 1 0 0 0 0 0
          2000 13 84 230 36 80 80 0 0 2840 1045  .57 409 .719  996  175  72 116 1486 18.6 1 1 0 0 1 0
          2100  1 82 230 24 26 18 1 0  950  358 .494 103 .767  171   81  44  25  434 16.7 1 0 0 0 0 0
          1800  3 81 254 26 80 80 1 0 3126 1559 .519 918 .766  853  219 144  70 2326 29.1 1 0 0 1 0 0
          1800  8 82 230 30 81 81 1 0 3002 1643 .471 598 .851  684  231  87  55 2085 25.7 0 0 0 1 1 0
          1700  7 81 235 28 80 78 1 0 2824 1563 .467 460 .787  650  198 106  72 1829 22.9 0 0 1 0 0 0
          1700  9 83 260 30 82 82 0 0 2739 1166 .477 341 .871  769  138  46 106 1409 17.2 0 1 0 1 0 0
          1700  7 75 175 29 71 71 0 1 2745 1221 .457 344 .785  662  559 195  20 1409 19.8 1 0 0 0 1 0
          1500  8 83 260 30 76 64 0 0 1917  607 .522 402 .826  500  105  42  81  969 12.8 0 0 0 0 0 0
          1500  6 79 215 29 82 82 0 1 3190 1710 .501 462 .816  342  164 108  22 2253 27.5 1 0 0 1 0 0
          1500  7 81 225 28 81 81 1 0 2960 1282 .548 321 .782  489  288 108  56 1657 20.5 1 0 0 1 1 0
          1500 11 79 215 33 81 81 1 0 2559 1371 .477 441 .819  384  294  64  13 1674 20.7 1 1 0 0 0 0
          1500  4 81 230 26 82 82 1 0 2510  758 .529 359 .786  739  103  94  55 1088 13.3 1 0 0 1 1 0
          1330  4 75 190 26 69 67 0 1 2409  903 .505 306  .85  172  390  63   5 1186 17.2 1 1 0 1 0 0
          1320 10 85 245 32 78 76 0 0 2333  768 .475 308 .766  521   90  21  41  966 12.4 1 1 0 1 0 0
          1300  7 88 290 32 81 82 0 0 2914  407 .462 200  .66  843   83  40 315  508  6.2 0 0 0 1 0 0
          1300  8 80 230 29 74 72 1 0 2446  702 .531 320 .666  696   78  61  36  959   13 1 0 1 0 0 0
          1300  5 82 235 27 66 40 1 0 1633  606 .459 154 .747  413   81  47  68  671 10.2 0 0 1 0 0 0
          1200  8 84 240 31 82 82 0 0 1806  543 .499 178 .646  545   54  28 180  657    8 1 0 1 0 0 0
          1200  8 85 255 31 64 62 0 0 1996  810 .448 220   .8  473  105  71  81  902 14.1 1 1 1 0 0 0
          1200 12 83 255 34 80 80 0 0 2587  835 .431 294 .905  623  289  85  61 1068 13.4 0 1 1 1 0 0
          1200  9 82 225 32 78 74 1 0 2876 1211 .546 533 .818  637  172  26  97 1758 22.5 0 1 0 0 1 0
          1200  8 78 200 30 78 78 0 1 2946 1249 .476 370 .854  273  288  65  20 1534 19.7 1 0 0 0 1 0
          1200  5 85 240 28 20  0 0 0  412  153 .451  49 .571  106   36   7  33  171  8.6 1 1 1 0 0 0
          1100  4 84 255 25 79 62 0 0 2585  907 .541 426 .744  696  157  57 221 1299 16.4 1 0 0 0 0 0
          1100  2 75 205 24 72 72 0 1 2477 1025 .467 258 .698  341  619 139   7 1219 16.9 1 1 0 1 0 0
          1100  5 83 235 27 82 82 1 0 3135  961 .542 450 .729  787  202  82  37 1370 16.7 1 0 1 1 0 0
          1100  3 85 255 25 78 78 0 0 2821 1012 .538 524 .737  718  285  63  40 1475 18.9 1 1 0 1 0 0
          1100  6 76 195 28 74 73 0 1 2605 1198 .491 226 .863  302  231 114  27 1448 19.6 1 0 0 1 0 0
          1100 11 82 220 33 79 72 0 0 1865  444 .495 106 .745  619   78  36 211  520  6.6 1 0 1 0 0 0
          1100  2 81 245 25 74 60 1 0 2120  930 .503 323 .743  541   52 564  27 1176 15.9 1 0 0 1 0 0
          1100  6 79 215 27 78 78 0 1 3006 1672 .496 548 .799  615  450 213  59 2123 27.2 1 0 0 0 1 0
          1050  5 73 175 27 82 82 0 1 3171  923 .538 452 .863  248 1118 263  14 1400 17.1 0 0 0 1 1 0
          1000  4 81 245 26 82 82 1 0 2604  835  .51 255 .773  861  187 104  14 1061 12.9 1 1 0 1 0 0
          1000  6 76 205 28 76 75 0 1 2418 1410  .48 340 .871  179  219  39  14 1651 21.7 1 1 0 0 0 0
          1000  6 83 232 28 82  3 1 0 2777 1272 .483 440 .825  447  138  48  91 1595 19.5 1 0 0 1 0 0
          1000  3 73 175 25 75 74 0 1 2728 1006 .526 292 .901  226  631 115   7 1414 18.9 0 1 0 1 0 0
          1000  8 82 217 30 73 72 1 0 2526  920 .539 334 .799  581  159  57 206 1259 17.2 1 1 0 0 0 0
          1000  6 81 220 28 82 39 1 0 2542 1027 .482 420 .757  522   71  34 121 1308   16 1 1 0 0 0 0
          1000  8 78 220 30 71 65 1 0 2302 1215 .467 508 .866  267  224  88  16 1606 22.6 0 1 0 0 0 1
          end
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://academicweb.nd.edu/~rwilliam/

          Comment


          • #6
            Firts off, I add my thanks to Jean-Claude for his skillful detection of the unknown dataset.

            Rochan:
            I' m probably late to the party, but throwing some -regress postestimation- commands, your model suffers from all the drawbacks a linear regression model can suffer (ie; quasi-extreme multicollinearity; heteroskedasticity and the evidence of non linearity between some predictors and the regressand, aka omitted variable bias):
            Code:
            . reg SALARY YRS HT WT AGE GAMES GAMESTRT FORWARD GUARD MIN FGA FGPRCNT FTA FTPRCNT REBOUNDS ASSISTS STEALS BLOCKS POINTS AVGPNTS RACE EW TRD WINTM ALLSTAR XPAN
            
                  Source |       SS           df       MS      Number of obs   =        56
            -------------+----------------------------------   F(25, 30)       =      1.96
                   Model |  15215965.8        25  608638.632   Prob > F        =    0.0398
                Residual |  9319718.12        30  310657.271   R-squared       =    0.6202
            -------------+----------------------------------   Adj R-squared   =    0.3036
                   Total |  24535683.9        55  446103.344   Root MSE        =    557.37
            
            ------------------------------------------------------------------------------
                  SALARY |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     YRS |   128.1303   133.5022     0.96   0.345    -144.5176    400.7783
                      HT |  -35.11119    69.0725    -0.51   0.615    -176.1761    105.9537
                      WT |   4.434424   9.095008     0.49   0.629    -14.14006    23.00891
                     AGE |  -132.0913   131.1246    -1.01   0.322    -399.8834    135.7008
                   GAMES |   27.34536   20.43367     1.34   0.191    -14.38576    69.07648
                GAMESTRT |   4.372733   7.472166     0.59   0.563    -10.88747    19.63293
                 FORWARD |  -362.2436   321.0887    -1.13   0.268    -1017.994     293.507
                   GUARD |  -1269.289   564.6327    -2.25   0.032    -2422.423   -116.1551
                     MIN |  -.4853707   .4743311    -1.02   0.314    -1.454084    .4833426
                     FGA |  -4.443802   2.684677    -1.66   0.108    -9.926643     1.03904
                 FGPRCNT |  -10676.55   6494.316    -1.64   0.111    -23939.71    2586.613
                     FTA |   -3.23988   2.239173    -1.45   0.158    -7.812882    1.333122
                 FTPRCNT |   -3518.25   1839.208    -1.91   0.065    -7274.414    237.9133
                REBOUNDS |  -.7522268   .6883566    -1.09   0.283    -2.158038    .6535849
                 ASSISTS |   1.069499   .7430943     1.44   0.160    -.4481017      2.5871
                  STEALS |  -.6288165   1.237957    -0.51   0.615    -3.157062    1.899429
                  BLOCKS |   .7969876   1.521496     0.52   0.604    -2.310322    3.904297
                  POINTS |    3.10725   2.525319     1.23   0.228     -2.05014    8.264639
                 AVGPNTS |   165.0579    62.8582     2.63   0.013     36.68431    293.4314
                    RACE |  -13.41989   225.5038    -0.06   0.953      -473.96    447.1202
                      EW |  -145.4598    186.694    -0.78   0.442    -526.7399    235.8203
                     TRD |  -298.5147   298.5381    -1.00   0.325    -908.2107    311.1814
                   WINTM |   13.94025   210.2766     0.07   0.948    -415.5018    443.3823
                 ALLSTAR |   506.9222   235.8301     2.15   0.040     25.29297    988.5515
                    XPAN |  -1111.609   665.9898    -1.67   0.106    -2471.741    248.5239
                   _cons |   12572.83    7250.27     1.73   0.093      -2234.2    27379.85
            ------------------------------------------------------------------------------
            
            . estat vif
            
                Variable |       VIF       1/VIF
            -------------+----------------------
                  POINTS |    377.44    0.002649
                     FGA |    237.38    0.004213
                     FTA |     31.78    0.031467
                     AGE |     27.98    0.035739
                     YRS |     27.31    0.036620
                 AVGPNTS |     26.20    0.038168
                     MIN |     17.92    0.055813
                   GAMES |     15.92    0.062831
                      HT |     11.32    0.088334
                   GUARD |     11.27    0.088728
                      WT |     10.44    0.095759
                 FGPRCNT |      8.70    0.114968
                 ASSISTS |      6.41    0.155983
                REBOUNDS |      4.99    0.200284
                GAMESTRT |      4.54    0.220068
                 FORWARD |      4.50    0.222320
                 FTPRCNT |      3.75    0.266616
                     TRD |      2.54    0.394335
                 ALLSTAR |      2.25    0.444955
                  BLOCKS |      2.25    0.445133
                  STEALS |      2.00    0.499259
                   WINTM |      1.99    0.502488
                    RACE |      1.80    0.556271
                      EW |      1.55    0.644031
                    XPAN |      1.40    0.713135
            -------------+----------------------
                Mean VIF |     33.75
            
            . estat hettest
            
            Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
                     Ho: Constant variance
                     Variables: fitted values of SALARY
            
                     chi2(1)      =     6.35
                     Prob > chi2  =   0.0117
            
            . estat ovtest
            
            Ramsey RESET test using powers of the fitted values of SALARY
                   Ho:  model has no omitted variables
                              F(3, 27) =      4.01
                              Prob > F =      0.0175
            Quasi-extreme multicollinearity id easy to explain here, as in the following case, where -AVGPNTS- is, in all likelihood, derivated from -POINTS-:
            Code:
            . pwcorr POINTS AVGPNTS
            
                         |   POINTS  AVGPNTS
            -------------+------------------
                  POINTS |   1.0000
                 AVGPNTS |   0.8902   1.0000
            Hence, it is not surprising that the two are strongly correlated.

            In sum:
            1) the -click and see- and -the more the stuff in the right hand side of the regression equation, the better the coefficients- should be discouraged altogether. Try to give a fair and true view of the data generating proccess, instead.
            2) the balance between the number of observations and the number of predictors is a fragile alchemy. The rule of thumb says "1 predictor each 10 observations" but other authoritative takes are even more restrictive (see the towering https://projecteuclid.org/download/p...aos/1176350710).
            3) generating outcome table with Stata is easy and appealing. The harder work relates to regression postestimation, though.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Richard Williams

              I ran the "update all" command as you suggested. I am getting the following error message - " unexpected end of file r(612); "

              Comment


              • #8
                Rochan:
                was your internet connection stable during the update?
                Whatever the reason was, try -update- again.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Rochan, like Carlo, my general advice is if you get an error like this is try again later. But you can also update without Internet access. See

                  https://www.stata.com/support/updates/

                  If you don't have your own personal copy of Stata, you may be at the mercy of site administrators to make the updates. If you have a pirated copy (hopefully not!) you may be out of luck. In any event, something definitely seems wrong with your current Stata software, and if you can't fix it yourself maybe you can find college support people who can. Or, maybe you can just reinstall Stata from scratch if your version has gotten corrupted.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://academicweb.nd.edu/~rwilliam/

                  Comment


                  • #10
                    Thanks for your comments. Indeed there was something wrong with my copy of STATA.

                    Comment


                    • #11
                      Click image for larger version

Name:	KitchenSink_ResidualsVsFitted.png
Views:	2
Size:	7.4 KB
ID:	1496451

                      The plot suggests non-linearity and the presence of outliers. Can anything be said about homoscedasticity/heteroscedasticity from this plot?

                      Comment


                      • #12
                        This handout suggests some ways for identifying and dealing with heteroskedasticity. One of the things I always stress is that, if heteroskedasticity seems to be present, you should first think about whether model mis-specification may be causing the problem.

                        https://www3.nd.edu/~rwilliam/stats2/l25.pdf
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://academicweb.nd.edu/~rwilliam/

                        Comment

                        Working...
                        X